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KNN-WG Formula 


The KNN-WG operates by assessing the similarity of weather conditions to those of 
previous days. To simulate weather variables for a new day (t+1), we begin by 
selecting days with similar characteristics to those simulated for day t from the 
historical record. From this selection, one of the nearest neighbors is chosen based 
on a predefined probability distribution or kernel. Subsequently, the observed 
values for the day immediately following that nearest neighbor's day are adopted 
as the simulated values for day t+1 (Sharif et al., 2007). The software follows these 


steps, and for more in-depth information, you can refer to Sharif and Burn (2006). 


Step 1: Consider the target variables for each station in the daily resolution of the 
historical records. If desired, you can calculate the regional means of the stations 


before inputting the data into the tool. 


Step 2: Select the current day for analysis. If you intend to predict data for 
tomorrow, you should have access to the historical data for today, which will be 


represented as the X: matrix. 
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Step 3: Choose the value for the neighbor's matrix size, denoted as L. So, you should 
select a temporal window of width w, which is typically set to 14 days. This 
temporal window encompasses one week before and one week after the current 
day. Therefore, if you have N years of historical data, the L size can be determined 


using the formula below: 
L=N x (W-1)-1 


Step 4: Calculate the neighbor matrix, denoted as C;. This matrix will have 


dimensions of LxP, where P represents the number of weather variables. 


Step 5: Determine the covariance matrix, C, for day t by utilizing the data block 


with dimensions of Lxp, where p signifies the number of weather variables. 


Step 6: Calculate the Mahalanobis distances (as described by Davis, 1986) between 
the X vector and each vector in the neighbor matrix Xi, where i ranges from 1 to L. 
Both X: and Xi have dimensions of 1xP. The Mahalanobis distance can be defined 


using the following formula: 


di = |(X; —X;)covC;*(X, — Xi)! 


where T denotes the transpose operation, and covC;? is the inverse of the 


covariance matrix. 
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Step 7: You should choose a number (K) to select K values of Mahalanobis distances 
and then select one of the first K nearest neighbors. Determining the number of 
the first K nearest neighbors to retain for resampling out of the total L neighbors 
can be done using various methods. Lall and Sharma (1996) suggested using the 
generalized cross-validation score (GCV) to choose K. Alternatively, Rajagoplan and 
Lall (1999) and Yates et al. (2003) recommended a heuristic method for selecting K 


with this formula: 


K = VL 


Step 8: Sort the Mahalanobis distances (di) in ascending order and select the K 


nearest neighbors from the top of the sorted list. Select the ds for this array. 


Step9: Calculate the weights Wj for the j" neighbor and compute the cumulative 


probabilities Pj using the following formulas: 


J 
ea 
j 
P; = 2 Wi 
i=1 
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Step 10: Choose a random number r between 0 and 1. If r is less than P4, select ds1. 
If r equals Py, select dsx. For values of j where Pj-1 is less than r and r is less than P;, 


choose dsj. 


Step 11: Locate the selected value of ds in the Mahalanobis distances array (di) and 


save the weather variables' values for the selected day as the X data. 


Step 12: Replace X: with X+ and repeat steps 1 to 12. 


Reference: Improved K -Nearest Neighbor Weather Generating Model 


What is the basis of ensemble? 


In KNN-WG, we've introduced an ensemble method of nr runs. In this approach, we 


calculate the weight for each run and then multiply this weight by each run. 


dV; = (ee _ yon): 
nr: Number of runs 


vers: Ensemble-averaged value of Variable 
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vin": Output of KNN model in the it run 
v2°s: Mean of observed variable in the calibration period 


yeaknn: Mean of Output of KNN model in the calibration period 
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